Adaptation of Graph-Based Semi-Supervised Methods to Large-Scale Text Data
نویسندگان
چکیده
Graph-based semi-supervised learning methods have shown to be efficient and effective on network data by propagating labels along neighboring nodes. These methods can also be applied to general data by constructing a graph where the nodes are the instances and the edges are weighted by the similarity between feature vectors of instances. However, whereas a natural network is often sparse, a network of pairwise similarities between instances is dense, and prohibitively large for even moderately sized text datasets. We show, through using a simple general technique, how these learning methods can be exactly and efficiently applied to text data—using the complete pair-wise similarity manifold—without resorting to sampling or sparsification. This technique also provides a unifying view of prior work on label propagation on text graphs, and we assess its effectiveness applied to two popular graph-based semisupervised methods on several large real datasets.
منابع مشابه
Efficient Graph-Based Semi-Supervised Learning of Structured Tagging Models
We describe a new scalable algorithm for semi-supervised training of conditional random fields (CRF) and its application to partof-speech (POS) tagging. The algorithm uses a similarity graph to encourage similar ngrams to have similar POS tags. We demonstrate the efficacy of our approach on a domain adaptation task, where we assume that we have access to large amounts of unlabeled data from the...
متن کاملExperiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different ...
متن کاملLarge-Scale Graph-based Semi-Supervised Learning via Tree Laplacian Solver
Graph-based Semi-Supervised learning is one of the most popular and successful semi-supervised learning methods. Typically, it predicts the labels of unlabeled data by minimizing a quadratic objective induced by the graph, which is unfortunately a procedure of polynomial complexity in the sample size n. In this paper, we address this scalability issue by proposing a method that approximately so...
متن کاملGraph-based semi-supervised learning with multi-modality propagation for large-scale image datasets
Semi-supervised learning (SSL) is widely-used to explore the vast amount of unlabeled data in the world. Over the decade, graph-based SSL becomes popular in automatic image annotation due to its power of learning globally based on local similarity. However, recent studies have shown that the emergence of large-scale datasets challenges the traditional methods. On the other hand, most previous w...
متن کاملIncremental Spectral Sparsification for Large-Scale Graph-Based Semi-Supervised Learning
While the harmonic function solution performs well in many semi-supervised learning (SSL) tasks, it is known to scale poorly with the number of samples. Recent successful and scalable methods, such as the eigenfunction method [11] focus on efficiently approximating the whole spectrum of the graph Laplacian constructed from the data. This is in contrast to various subsampling and quantization me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011